Robust Spoken Document Retrieval Based on Multilingual Subphonetic Segment Recognition

نویسندگان

  • Shi-wook Lee
  • Kazuyo Tanaka
  • Yoshiaki Itoh
چکیده

This paper describes the development and application of a subphonetic segment recognition system for spoken document retrieval. Following from the development of an open-vocabulary spoken document retrieval system, where the retrieval process is accomplished in the symbolic domain by measuring the distance between the parts of subphonetic segment results from pattern recognition in the acoustic domain, the system proposed here performs matching based on subphonetic segment as more basic unit than the semantic unit. As such, the system is not constrained by vocabulary or grammar, and can be readily extended to multilingual tasks. This paper presents the proposed spoken document retrieval system including the proposed subphonetic segment recognition scheme, and evaluates the performance and feasibility of the system through experimental application to multilingual retrieval tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Open-Vocabulary Spoken Document Retrieval Based On Multilingual Subphonetic Segment Recognition

This paper describes the development and application of a subphonetic segment recognition system for spoken document retrieval. Following from the development of an open-vocabulary spoken document retrieval system, where the retrieval process is accomplished in the symbolic domain by measuring the distance between the parts of subphonetic segment results from pattern recognition in the acoustic...

متن کامل

Multilayer subword units for open-vocabulary spoken document retrieval

This paper describes the application of subword units in an effort of improving open-vocabulary spoken document retrieval performance in the case of highly corrupted recognition output. This paper presents the developed open-vocabulary spoken document retrieval system including the newly proposed subphonetic segment unit and combining multilayer subword units. Our experiments on Japanese spoken...

متن کامل

A robust fusion method for multilingual spoken document retrieval systems employing tiered resources

In this study, we present two novel fusion approaches to merge subword and word based retrieval methods within a multilingual spoken document retrieval (SDR) system. Considering the fact that more than 6000 languages are spoken in the world today, resources (e.g., text and audio data, pronunciation lexicon) needed to develop Automatic Speech Recognition (ASR) systems for such a range of languag...

متن کامل

ETH TREC-6: Routing, Chinese, Cross-Language and Spoken Document Retrieval

ETH Zurich's participation in TREC-6 consists of experiments in the main routing task, both manual and automatic runs in the Chinese retrieval track, cross-language retrieval in each of German, French and En-glish as part of the new cross-language retrieval track, and experiments in speech recognition and retrieval under the new spoken document retrieval track. This year our routing experiments...

متن کامل

Towards High Performance Phonotactic Feature for Spoken Language Recognition

With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004